Geometric morphometrics versus DNA barcoding for the identification of malaria vectors Anopheles dirus and An. baimaii in the Thai-Cambodia border

Anopheles (Cellia) dirus Peyton & Harrison and Anopheles baimaii Sallum & Peyton are sibling species within the Dirus complex belonging to the Leucosphyrus group, and have been incriminated as primary vectors of malaria in Thailand. In the present study, DNA barcoding and geometric morphometrics were used to distinguish between An. dirus and An. baimaii in the international border areas, Trat Province, eastern Thailand. Our results revealed that DNA barcoding based on the cytochrome c oxidase subunit I gene could not be used to distinguish An. dirus from An. baimaii. The overlapping values between intra- and interspecific genetic divergence indicated no barcoding gap present for An. dirus and An. baimaii (ranging from 0 to 0.99%). However, the results of the geometric morphometric analysis based on the wing shape clearly distinguished An. dirus and An. baimaii, with 92.42% of specimens assigned to the correct species. We concluded that geometric morphometrics is an effective tool for the correct species identification of these two malaria vectors. Our findings could be used to make entomological surveillance information more accurate, leading to further effective mosquito control planning in Thailand and other countries in Southeast Asia.


Methods
Ethics statement. Our study was conducted in strict accordance with the guidelines and protocols for animal care and use of the Suan Sunandha Rajabhat University, Thailand. All experimental procedures involving mosquitoes were monitored and approved by the Ethics Committee of the Institutional Animal Care and Use Committee of Suan Sunandha Rajabhat University, Bangkok, Thailand (Approval No. IACOC 64-006/2021).

Study sites and mosquito collection.
Anopheles mosquitoes were collected from Trat Province, located in the eastern Thai-Cambodia border. Collection was carried out at two sites: the mainland Thai-Cambodia border (12°28′47.2ʺ N 102°40′51.8ʺ E, Dan Chumphon Sub-district, Bo Rai District), and the island Thai-Cambodia border (11°39′21.6ʺ N 102°34′52.6ʺ E, Koh Kood Sub-district, Koh Kood District [Kood island]) (Fig. 1A). Both sites are malaria hot spot areas, according to the malaria report of the Ministry of Public Health of Thailand (accessed at http:// malar ia. ddc. moph. go. th/). Ten BG-Pro CDC-style mosquito traps (BioGents, Regensbourg, Germany) supplemented with BG-lure cartridges (BioGents) and dry ice (solid carbon dioxide) were used at each border site to catch Anopheles mosquitoes (Fig. 1B). The mosquito collection was carried out simultaneously at the two sites over five consecutive nights monthly from 18:00 to 06:00 during June and August 2021. The field-collected mosquitoes were transported back to the laboratory at the College of Allied Health Sciences, Suan Sunandha Rajabhat University, Thailand for species identification. Before the identification of Anopheles species, all mosquito specimens were kept individually in 1.5 ml Eppendorf tubes and stored in a freezer at − 20 °C.
Mosquito identification. Anopheles mosquitoes were identified using a morphologic taxonomic key 17 42 were identified and confirmed using the multiplex PCR based on ribosomal DNA ITS2 species-specific primers, following previously published protocols [40][41][42] 45 to generate a consensus sequence for each individual Anopheles sample. All consensus sequences of An. dirus and An. baimaii were compared with those available in the global sequence database to confirm the species, including the NCBI GenBank using the Basic Local Alignment Search Tool (BLAST) and the Barcode of Life Data Systems (BOLD) using the BOLD Identification System. After that the COI sequences were automatically aligned using the Clustal X 46 , the genetic distances were computed, including intra-and interspecific genetic divergences based on the Kimura 2-parameter distance model (K2P) and phylogenetic trees constructed using MEGA X software 47 . The maximum likelihood (ML) and neighbor joining (NJ) methods were used for phylogenetic analyses. The node support of the phylogenetic tree was assessed using bootstrap analysis with 1000 replicates. The Tamura 3-parameter with Gamma distribution (G) was the most appropriate nucleotide substitution model for producing the ML phylogenetic tree.
Wing geometric morphometrics. All complete right wings of female An. dirus and An. baimaii were detached from the thorax, and the wing scales on the veins were removed and mounted on microscope slides under glass coverslips with Hoyer's mounting medium. The microscope slides of wings were then photographed using a digital camera attached to a Nikon SMZ 800 N stereo-microscope (Nikon Corp., Tokyo, Japan). Subsequently, the coordinates of 18 landmarks ( Fig. 2A) were digitized from each wing image. The landmark locations used in this study were based on previous research studies 48,49 . XY Online Morphometrics (XYOM) 50 , an online tool, was used for landmark digitization, GM analyses, and graphical outputs in this study. Ten images per species were selected at random and digitized twice by the same person to test the repeatability of the landmarks and evaluate the accuracy of landmark collection in a data set 51 . To assess allometry-the relationship between wing size and wing shape-linear regression of the first (shape derived) discriminant factor on the wing size was performed, and the coefficient of determination was subsequently estimated. www.nature.com/scientificreports/ For wing size analyses, the global wing size was estimated from the centroid size (CS), which is defined as the square root of the sum of the squared distances between each landmark and the centroid of the configuration 52 . The CS was calculated after the Procrustes superimposition of the landmark configurations. The wing CS difference between both species was analyzed using one-way ANOVA. The statistical significance of the oneway ANOVA was estimated using a non-parametric procedure (1000 runs) with a threshold of significance of p-value < 0.05. Discriminant analysis (or canonical variate analysis) was used to assess wing shape variation among species using a factor map and then the Mahalanobis distance was computed in order to estimate the divergence in shape between the species. The final shape variables were computed using Generalized Procrustes Analysis and used as input to a discriminant analysis. Wing shape differences based on the Mahalanobis distances between An. dirus and An. baimaii were estimated using a non-parametric procedure (1000 runs) at a p-value < 0.05. To assess the correct assignment of An. dirus and An. baimaii and confirm the reliability of the GM tool when distinguishing between the species based on wing shape, a cross-validated reclassification procedure was carried out based on allocating each individual into the closest group 53 . www.nature.com/scientificreports/

Results
Two hundred and sixty-four adult Anopheles mosquito samples, representing 11 species, of which three species belonged to the subgenus Anopheles and eight to the subgenus Cellia, were collected from the Thai-Cambodia border areas of Trat Province, Thailand during June and August 2021 (Table 1). Adult An. jamesii, An. philippinensis, and An. pseudojamesi specimens were morphologically identified. Other Anopheles mosquitoes were identified to the species using multiplex PCR. Ten Anopheles spp. were collected from the mainland border with the exception of An. baimaii, which was found in the island border (16.29% of the total Anopheles captured in this study). The PCR product of the multiplex PCR assay for the identification of member species in the Dirus complex is shown in Fig. 3A.

DNA barcoding for species identification.
After confirmation of the true species, partial sequences of the COI gene were amplified using universal barcode primers from ten randomly selected specimens of An. dirus and An. baimaii. No insertions, deletions, stop codons, or pseudogenes were found in any of the COI sequences, when the chromatograms were checked. The average nucleotide composition percentages of the COI sequences of An. dirus and An. baimaii were A = 30.3%, T = 39%, C = 15%, and G = 15.7%, and the sequences were AT rich, with an average AT content of 69.3%, higher than the GC content (30.7%). The average intraspecific genetic divergence percentages of An. dirus was 0.51%, ranging from 0% to 1.14%, and that of An. baimaii was 0.35%, ranging from 0% to 0.71%. The average interspecific divergence between the species was 0.47%, ranging from 0% to 0.99%. Thus, the intra-and interspecific divergence showed overlaps between the two Anopheles spp., indicating no barcoding gap. For identification based on the online genetic database, the COI sequences of An. dirus and An. baimaii from this study were compared with those available in the NCBI using BLAST, and in the BOLD database and were shown to have identity overlaps between the species.
Our COI sequences in this study, and several sequences available from GenBank, were used for construction of a phylogenetic tree ( Table 2). The ML tree showed the phylogenetic relationships of the members of the Dirus complex (Fig. 3B). All sequences of An. dirus and An. baimaii from this study and GenBank were unambiguously assigned to the same clade. Other members of the Dirus complex, including An. cracens, An. nemophilous, An. elegans, and An. scanloni showed a clear-cut separation between the species. Culex gelidus, as an outgroup species, was clearly separated from clusters of species members within the Dirus complex.
Geometric morphometrics for species identification. Thirty wings from An. dirus and thirty-six wings from An. baimaii were used for GM analyses. The precision and accuracy of the digitization of landmarks in the wing image sets showed good scores, based on their repeatability. The repeatability scores were more than 95% for size and shape. Assessment of the allometric effect showed that our sample exhibited no apparent relationship between size and shape (r 2 = 0.2%), which was apparent from the relatively straight line of the linear regression (Fig. 4A).
The wing CS variations of An. dirus and An. baimaii are shown in Fig. 4B. The mean wing CS of An. dirus was 2.89 mm, ranging from 2.68-3.11 mm, while An. baimaii was 2.90 mm, ranging from 2.74-3.07 mm (Table 3). No significant difference in wing CS between species was found (p > 0.05).
For wing shape analyses, visualizations of the wing shapes from the superposition of the mean landmark configurations were created before the extraction of the final shape variables (Fig. 2B). The superimposed landmark coordinates at positions 17 and 18 varied the most between An. dirus and An. baimaii (Fig. 2C). The factor map based on discriminant analysis of the wing shape showed that no samples of the species overlapped (Fig. 2D). The Mahalanobis distance between the species was 5.86, and was statistically significant (p < 0.05). These results www.nature.com/scientificreports/   (Table 4).

Discussion
A recent study by Pimnon and Bhumiratana (2018)    www.nature.com/scientificreports/ An. saeungae, and An. wejchoochotei, were collected from the Thai-Cambodia border. An increasing number of Anopheles spp. were reported in this survey, due to the use of molecular biology techniques in the investigation. In particular, sibling members within the Anopheles species complex are difficult to identify species by standard morphological methods 43,54 . In the present study, three sibling species of the Barbirostris complex, An. dissidens, An. saeungae, and An. wejchoochotei, and two sibling species of the Dirus complex, An. dirus and An. baimaii, were identified using species-specific multiplex PCR. This study is the first time that An. baimaii was found in an island area, Thailand. This Anopheles spp. has been reported to have a distribution restricted to some forested areas of central, southern and western Thailand, especially the Thai-Myanmar border 14 .
In Thailand, five species of the Dirus complex are reported: An. dirus (species A), An. cracens (species B), An. scanloni (species C), An. baimaii (species D), and An. nemophilous (species F) 9,13,16 . However, only An. baimaii and An. dirus are considered as the primary vectors of malaria in Thailand, and can be found along the Thai-Cambodia border. Tananchai et al. 2012 16 studied the biology of two Anopheles spp. in Pu Teuy Village, Kanchanaburi Province, western Thailand, and found that they had different patterns of feeding activity. Their study was the starting point for understanding the behavioral differences between the two vector species, and formed the basis of strategies for malaria control and prevention in areas in which malaria is endemic. However, a variety of effective alternative techniques are needed to aid in the identification of both species, according to the appropriate situation.
DNA barcoding is widely recognized as a modern molecular technique which can help identify unknown mosquito specimens to the species level 55 . However, the results of this study revealed that DNA barcoding based on the COI gene, as the "gold standard" barcode for animals 24,25 , could not be used to discriminate An. dirus from An. baimaii. The overlapping values for intra-and interspecific genetic divergence indicated that there was no barcoding gap for An. dirus and An. baimaii. The lack of a barcoding gap means that the genetic distances of our COI sequences cannot be used to identify An. dirus and An. baimaii, due to insufficient nucleotide differences making it impossible to delineate species boundaries 56 . This finding was consistent with phylogenetic analysis based on the COI sequences, which found that the species were not grouped separately into clades, while the other members of the Dirus complex, including An. cracens, An. nemophilous, An. elegans, and An. scanloni were clearly separated. Therefore, although this technique cannot separate An. dirus and An. baimaii, it can effectively classify the other species members of the Dirus complex. Our results are consistent with previous studies reporting that partial COI gene sequences failed in distinguishing between An. dirus and An. baimaii but can clearly distinguish other species of the Dirus complex and the Leucosphyrus group 57 . Often, this technique has failed to identify members in species complexes or closely related species, due to overlapping intra-and interspecific variation caused by incomplete lineage sorting between recently diverged species, such as some member species in the Anopheles strodei subgroup 58 .
In contrast, the results of GM based on wing shape analysis revealed an excellent differentiation between An. dirus and An. baimaii, with 92.42% of specimens correctly assigned based on a cross-validated reclassification procedure. The discriminant analysis identified the differences between the wing shapes of An. dirus and An. baimaii. Visual wing shape based on superposition of the mean landmark configurations showed an incompatibility between An. dirus and An. baimaii at landmark positions 17 and 18. These results suggest that the wing venation of each species is unique, leading to the successful identification based on wing GM analyses. The shape of the wing was recognized as a valuable variable for GM analyses in identifying mosquito species due to its stable character and distinctive structure 31,36 . These findings are in agreement with previous studies which found that GM techniques based on wing shape analyses can assist in the identification of members among the Anopheles species group and complex in Thailand, such as two species of the Minimus complex, An. minimus and An. harrisoni 39 , and three species of the Maculatus group, An. maculatus, An. sawadwongporni, and An. pseudowillmori 38 . Wing size is a variable that is not suitable for species identification by GM because it varies according to environmental influences 59,60 . Our analyses of the wing size based on CS in this study indicated that An. dirus and An. baimaii were not statistically different. Consistent with the results of assessing allometry, the size and shape of the wings had no relationship. Therefore, the wing size of An. dirus and An. baimaii was not used for species identification.

Conclusions
The findings of this study revealed that the GM technique can be used to effectively distinguish An. dirus from An. baimaii. This modern technique is valuable in laboratories with budget or molecular biology equipment constraints. However, the use of the GM technique should be revalidated each time in every other target area to prevent errors from morphological variations of mosquito specimens in response to environmental factors. In addition, further studies with a large number of specimens are needed to increase the confidence in this technique. For molecular techniques, our results show that DNA barcoding cannot distinguish An. dirus from An. baimaii due to the low difference in the nucleotide sequences of the COI gene of the species. However, the multiplex PCR method based on ITS2 remains the most effective in identifying both species. This study is an important guideline for identifying An. dirus and An. baimaii sibling species within the Dirus complex. These species are important malaria vectors in the border areas, and the ability to identify them will lead to the development of further effective strategies for the control of the Anopheles population in Thailand and other countries in Southeast Asia.